Fix regex to catch HTML tags #398

Wauplin · 2023-09-20T13:47:56Z

This PR fixes the CI for https://github.com/huggingface/course and in particular ./es/chapter5/5.mdx.
Related to this slack thread (internal) and this failing CI.

The failing doc had this form:

10K<n<100K

<Tip>

This is a tip.

</Tip>

in which the _re_lt_html regex detected <n<100K \n<Tip> as a single HTML tag. The _re_lt_html is meant to detect the < characters that are not part of a HTML tag.

I fixed the regex for this use case + added a regression test for it. All previous unit tests are still passing so hopefully it doesn't break something in the wild. I also took the liberty to add verbose mode to explain a bit more what the regex is doing (too me a bit of time to remember 🙄).

cc @mishig25 @MKhalusova @xenova

(also related to #373 and #394 which introduced and modified this regex)

Wauplin · 2023-09-20T14:12:46Z

CI was failing on an un-related test:

tests/test_autodoc.py:283: AssertionError
=========================== short test summary info ============================
FAILED tests/test_autodoc.py::AutodocTester::test_document_object - AssertionError: '\n<d[221 chars]ters>[{"name": "*args", "val": ""}, {"name": "[462 chars]\n\n' != '\n<d[221 chars]ters>""</parameters></docstring>\n\nBase class[401 chars]\n\n'
Diff is 1304 characters long. Set self.maxDiff to None to see it.
========================= 1 failed, 83 passed in 4.21s =========================

It is due to this commit on transformers that added arguments to the ModelInfo object.

I fixed the expected result in ec14a49.

MKhalusova · 2023-09-20T14:25:27Z

Thank you for fixing this!

mishig25 · 2023-09-20T14:32:14Z

not to forget this dev change: 8c6163b

xenova

Green means go! 🚀🚢 (🟢 for Transformers.js)

just a reminder to remove that dev change

This reverts commit 8c6163b.

mishig25

lgtm !

Fix regex to catch HTML tags

b6be62a

Wauplin requested review from mishig25 and xenova September 20, 2023 13:47

Wauplin added 2 commits September 20, 2023 15:49

comments

b7be025

fix unrelated test

ec14a49

temporary doc test

8c6163b

mishig25 mentioned this pull request Sep 20, 2023

[wip: test doc builder fix-lt-html-regex] huggingface/transformers#26295

Closed

xenova approved these changes Sep 20, 2023

View reviewed changes

Revert "temporary doc test"

ea5a296

This reverts commit 8c6163b.

Wauplin mentioned this pull request Sep 20, 2023

Add doc_builder_revision as workflow input to ease debugging #399

Merged

mishig25 approved these changes Sep 21, 2023

View reviewed changes

mishig25 merged commit daaaf9a into main Sep 21, 2023
4 checks passed

mishig25 deleted the fix-lt-html-regex branch September 21, 2023 08:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix regex to catch HTML tags #398

Fix regex to catch HTML tags #398

Wauplin commented Sep 20, 2023

Wauplin commented Sep 20, 2023

MKhalusova commented Sep 20, 2023

mishig25 commented Sep 20, 2023

xenova left a comment

mishig25 left a comment

Fix regex to catch HTML tags #398

Fix regex to catch HTML tags #398

Conversation

Wauplin commented Sep 20, 2023

Wauplin commented Sep 20, 2023

MKhalusova commented Sep 20, 2023

mishig25 commented Sep 20, 2023

xenova left a comment

Choose a reason for hiding this comment

mishig25 left a comment

Choose a reason for hiding this comment